April 1946 


IOMETRICS 


THE BIOMETRICS SECTION, AMERICAN STATISTICAL ASSOCIATION 


THE COVARIANCE ANALYSIS OF MULTIPLE 
CLASSIFICATION TABLES WITH UNEQUAL SUBCLASS NUMBERS 
L. N. 
Western Sheep Breeding Laboratory 
U. S. Department of Agriculture 

Data which can be classified simultaneously in two or more ways with unequal numbers 
in the various subclasses are commonly found in many kinds of experimental work, particularly 


where unequal numbers are characteristic of the parent populations. Frequently, too, the at- 
tribute under study may be associated with one or more independent variebles which exhibit 


Table 1. Numbers, weights, ages and inbreeding of lambs classed according to age of dam 
and type of birth 


Type Age of dam 
of Attribute Mature dams Young dams Total 
birth 


Number 284 
Weight (pounds) Y 22,927 
. Single lambs (T;) Age (days) A 34,031 
Inbreeding (percent) P i i 3,145.8 
Number 135 
Weight (pounds) Y 9,560 
Twin lambs (T:) Age (days) A 16,291 
Inbreeding (percent) P 1,171.2 
Number 59 
Twin lambs Weight (pounds) Y 4,559 
reared singly (T;) Age (days) A 7,082 
Inbreeding (percent) P 700.8 
Number 478 
Weight (pounds) Y 37,046 
Age (days) A 
Inbreeding (percent) P 
S(Y)? = 2,963,362 
S(A)? = 6,904,328 
S(P)? = 92,156.64 


4 


continuous variation. When independent variables and multiple classifications with unequal 
subclass numbers occur in the same group of data, some problems in analysis arise which have 
not been extensively dealt with. This report is intended to give an outline of a method suitable 
for the analysis of data with the above properties. 


The lamb weights summarized in Table 1 may be used to illustrate some of the problems and 
the method of analysis. The lambs are grouped according to age of dam into two classes and 
according to type of birth into three classes, making a total of six subclasses. The extreme 
disproportion in subclass numbers is due to the fact that mature ewes normally produce a 
greater proportion of twin lambs than do young ewes. In addition to the two major classifica- 
tions, two independent variables, the age in days at weighing and percent inbreeding may be 
associated with weight of the lambs. 

The practical statistician is usually concerned with obtaining efficient and unbiased estimates 
of certain population parameters from a given body of data, the appropriate method of analysis 
depending upon the parameters to be estimated. In the present case, the method of analysis 
is specifically designed to eliminate from each estimate possible inequalities due to the other 
factors insofar as this is possible. For example, the difference in average weight between lambs 
having mature and young dams is 7.27 pounds. Evidently this difference does not provide an 
unbiased estimate of the effect of age of dam unless we are prepared to ignore possible dis- 
crepancies due to the disproportionate subclass numbers and to differences in age and inbreeding. 

Yates (2) presented a method of applying least squares procedure to the analysis of multiple 
classification tables with unequal numbers in the subclasses. This method, frequently described 
as the method of fitting constants, can be extended without difficulty to include independent vari- 
ables. In the present example, one constant must be fitted for the Y-intercept, two for age of dam, 
three for type of birth, and one each for age and percent inbreeding. Each observation is defined 
by the equation: 

qa) Y= a+d,D; + tiT; + + tsTs + b.A +b,P +E. 

Here the lower case letters represent the constants to be fitted, and the capital letters are 
defined as follows: D: and D, take the values 1 and 0, or 0 and 1, depending upon whether an 
observation lies in the mature dam group or in the young dam group; 7:, Tz, and Ts; take the 
values 1, 0, and 0, or 0, 1, and 0, or 0, 0, and 1, depending upon where an observation lies with 
respect to type of birth; A and P are the values for age and percent inbreeding which are 
associated with that particular observation, while E represents the error. The least squares 
procedure provides estimates of the constants such that S(E)? is a minimum as compared with 
any other set of additive and linear constants. 

Summing equation 1 over the 478 observations in Table 1 we obtain the equation: 

(2) 478a + 352d, + 126d2 + 284t: + 135te + 59ts + 57,404b, + 5017.8b, = 37,046, 
which can be used subsequently in estimating the Y-intercept a. The least squares method of 
setting up the simultaneous equations is to multiply equation 1 by each variable (D,, Dz, T:, etc.) 
summing over all observations. Using the data in Table 1, this procedure gives the following 
equations: 

(3) 352a + 352d, + Ode +179t, + 117te + 56ts + 42249b, + 3502.3bp = 27,955 

(4) 126a + Od; + 126d2 + 105t: + 18te + 3ts + 15,155b, + 1515.5bp = 9,091 

(5) 284a + 179d, + 105d2 + 284t, + Ote + Ots + 34031b, + 3145.8b, = 22,927 

(6) 135a + 117d; + 18d2 + Ot: +135t2 +0ts + 16291b, + 1171.2b, = 9,560 

(7) 59a + 56d: +3ds + Ot: +Ote +59ts + 7082b. + 700.8b, = 4,559 

(8) 57404a + 42249d; + 15155d. + 34031t, + 16291te + 7082ts + 6,904,328b. +602,971.6b, 

= 4,452,731 

(9) 5017.8a + 3502.3d, + 1515.5d2 4+ 3145.8t, + 1171.2te + 700.8ts + 602,971.6b. 

+ 92156.64b, = 378,947.0 

The equations having the leading terms d: and dz can be eliminated simultaneously from 


the other equations, as can those having the leading terms t:, ts, and ts. By successive elimination 
the equations can be reduced to the following: 

(3a) 83.821d, — 83.821ld2 = 859.379 

(4a) —83.821d, + 83.821d2 = —859.379 


(Sa) 104.550t, — 73.281t. —  31.269ts = 1189.722 
(6a) —73.281t, + 91.737t. — 18.456ts = —1108.822 
(7a) —31.2691 — 18.456t, + 49.7251 = —80.900 
(8a) 10,472.218b, = 5130.711 
(9a) 38,498.842b, = —10,639.150 


It is evident that equations 3a and 4a are not independent since their sum is zero. This 
is consistent with the fact that only 1 degree of freedom exists for age of dam. In addition, 
equations 5a, 6a, and 7a add to zero, this being consistent with the 2 degrees of freedom for 
type of birth. Hence an additional relation must be established between di and de and between 
ti, te and ts to proceed with the analysis. We may take di + d: = 0 and ti + t2 + ts = 0, 
using the appropriate relation to replace one of the above equations in solving for the respective 
constants. With these restrictions, estimates of the constants are as follows: 


di= 5.12628 ts = 0.06773 

ds = —5.12628 0.48994 

ti= 6.67416 bp = —0.27635 
ts = —6.74192 


These estimates being obtained, equation 2 may be solved, the estimate of a being 17.07158. 

Unbiased and efficient estimates of the differences between the main classes, with other 
factors constant, can now be calculated. For example, the difference in weight between lambs 
having mature and young dams is d: — dz = 10.25 pounds. The difference between single and 
twin lambs is t: — te = 13.42 pounds, and that between single lambs and twins reared singly is 
ti: — ts = 6.61 pounds. The constants b, and by are partial regression coefficients measuring 
the average change in weight associated with one unit increase in age and percent inbreeding, 
respectively. 

The problem of setting up tests of significance logically follows that of estimation. The 
sum of the squared deviations from the general mean is 


2,963,362 — (37,046)°/478 = 92,219.5 


The total reduction due to the several sources of variation including both direct and combined 
effects is b:S(XiY) + beS(X:Y) + ....— (SY)?/n with the usual least squares procedure. 
Applying this to the present example wherein S(X:Y) is given by the right hand terms of equa- 
tions 2 to 9, we have 17.07158(37,046) + 5.12628(27,955) + ...— (37,046)?/478 = 23,716.9. 
The direct reduction associated with the ith variable is b:S(x:y), where S(x:y) is the right hand 
term of the ith normalized equation obtained by eliminating the association with other variables. 
Applying this principle to the normalized equations for the D, T, A and P variables (equations 
3a to 9a), the sum of squares directly associated with each of the major sources of variation is 
as follows: 


Age of dam = (5.12628) (859.379) + (—5.12628) (—859.379) = 8,810.8 
Type of birth = (6.67416) (1189.722) + (—6.74192) (—1108.822) 

; + (0.06776) (—80.900) = 15,410.5 

Age = (0.48994) (5130.711) =, 


Percent inbreeding = (—0.27635) (—10,639.150) = 2,940.1 
Reduction in sum of squares due to direct effects = 29,675.1 


The direct reduction due to a given source is the difference between the total reduction 
23 


with all variables included and that which would be obtained if that particular variable were 
ignored altogether. This may be illustrated by considering one of several alternative methods 
of computing the total reduction: 


Age of dam (other sources ignored) .- 4,899.7 
Type of birth (age of dam fixed) ................cc..ccceeseessesteeseeteesneeee 13,496.9 
Age (age of dam and type of birth fixed) 0.0.00... 2,382.9 
Inbreeding (age of dam, type of birth and age fixed) ............ 2,940.1 

Total reduction in sum of squares ........................ 23,719.6 


The difference between the present figure and that given above is due to small errors in 
rounding the regression coefficients. 

In the usual least squares procedure dealing with continuous variables, the total reduction 
will always be equal to or greater than the sum of the direct effects, but this may or may not be 
true of discrete classifications with unequal subclass numbers. In this particular example, a 
part of the true difference between ages of dam or types of birth is concealed in the marginal 
totals because the twin lambs are proportionately more numerous in the mature dam group. 
This is illustrated by the reduction in the sum of squares due to age of dam; the reduction with 
other sources ignored is 4,899.7, while with other sources fixed it is 8,810.8. Differences in 
average age and percent inbreeding evidently contribute somewhat to the discrepancy but their 
effects are much less important than the disproportionate subclass numbers. 

If we are content to assume without further question that equation 1 is adequate to describe 
the manner in which the main effects combine, the analysis of variance can be set up as shown 
in Table 2. The difference between the total sum of squares and the total reduction provides an 
error term for testing the significance of the direct reduction in variance due to each of the main 
sources. It is evident that all of the main effects are highly significant sources of variation, age 
of dam and type of birth being more important than age and percent inbreeding. 


Table 2. Analysis of variance for weight of lambs 


Degrees of 
Source of variation freedom Sum of squares Mean squares 
Total reduction 5 
Error 472 68,502.6 145.1 
Direct effects 
Age of dam 1 8,810.8 8,810.8 
Type of birth 2 15,410.5 7,705.2 
Age 1 2,513.7 2,513.7 
Inbreeding 1 2,940.1 2,940.1 


In some cases we may wish to examine the data more closely with respect to their conform- 
ance to the rules prescribed in equation 1. This equation is about as simple as any which can 
be expected to fit the data (since all of the main effects have been found significant) but no 
assurance is provided that the main effects, and the relations between them, are actually as 
simple as described therein. For example, the assumption is made in setting up equation 1 
that differences in weight due to age of dam and type of birth combine additively, or without 
interaction. Yates (2) pointed out that the method of fitting constants provided a valid test 
of the interaction, but not of the main effects where interaction exists. He suggested the method 
of weighted squares of means for the analysis of data with real interaction. In the present 
case, we may compare the actual subclass means with “expected” means derived from the con- 
stants previously calculated, testing in this manner the interaction between age of dam and type 


24 


of birth which is assumed nonexistent in equation 1. The actual and expected subclass means, 
and the difference between them are given in table 3. The sum of squares due to interaction 
may be obtained by squaring each difference, multiplying by the appropriate subclass number, 
and summing over the 6 subclasses. Since the sum of squares for interaction is 108.1 and this 
source of variation has 2 degrees of freedom, the mean square is less than the experimental 
error in table 2. Hence, the evidence from the data confirms, or at least does not refute, the 
hypothesis of an additive relation between the main effects for age of dam and type of birth. 


Table 3. Actual and expected mean weights in the subclasses 


Type of birth Mean Mature dams Young dams 
Number 179 105 
Single lambs Actual 84.844 73.714 
Expected 84.692 73.970 
Difference 0.152 —0.256 
Number 117 18 
Twin lambs Actual 71.983 63.222 
Expected 72.076 62.606 
Difference —0.093 0.616 
Number 56 3 
Twin lambs Actual 77.607 71.000 
reared singly Expected 77.891 65.687 
Difference —0.284 5.313 


The constants fitted for age and for inbreeding are averages of the individual linear 
regressions within the six subclasses. We may wish to examine more fully whether the 
individual regressions for age or for percent inbreeding differ significantly among themselves, 
or whether a curvilinear regression would provide a significantly better fit than a linear 
regression. Since the appropriate procedures for these tests are similar to those used for the 
same purposes in the usual covariance analysis they need not be considered in detail. 

Snedecor and Cox (1) summarized and compared the several methods suitable for the 
analysis of multiple classication tables with unequal subclass numbers. They pointed out that 
Yates’ method of fitting constants and conventional methods of analysis of variance yield the 
same results when the subclass numbers are equal. The method of fitting constants, extended 
to include independent variables with continuous distributions, also gives the same results 
with equal subclass numbers as do the conventional methods of covariance analysis. The method 
can be extended to include as many classifications or as many independent variables as desired, 


simply by setting up additional simultaneous equations for the additional classifications or 
variables. 


REFERENCES 
1. Snedecor, George W., and Gertrude M. Cox. Dierenertonste subclass numbers in tables 
¥ multiple classification. Research Builetin No. 180. Iowa State College, Ames, Iowa, 1 


e, 935. 
2. Yates, F. The analysis of aultie’s classifications with unequal suaubens in the different 
subclasses. J. Amer. Stat. Assn. 29:51-66. 1934. 


Entered as second-class matter, May 25, 1945, at the post office at Washington, D. C., under 

the Act of March 3, 1879. The Biometrics Bulletin is published six times a year—in February, 
’ April, June, August, October and December—by the American Statistical Association for its 
Biometrics Section. Editorial Office: 1603 K Street, N.W., Washington 6, D. C. 

Membership dues in the American Statistical Association are $5.00 a year, of which $3.00 
is for a year’s subscription to the Quarterly Journal, fifty cents is for a year’s subscription to 
the ASA Bulletin and_ members who pay $1.00 additiona! receive a year’s subscription to the 
Biometrics Bulletin. Dues for Associate members of the Biometrics Section are $2.00 a year, 
of which $1.00 is for a year’s subscription to the Biometrics Bulletin. Single copies of the 
Biometrics Bulletin are 60 cents each and annual subscriptions are $2.00 Subscriptions and 
applications for menentate should be sent to the American Statistical Association, 1603 K 
Street, N.W., Washington 6, D. C. 


25 


| 


STATISTICAL METHODS IN CEREAL CHEMISTRY 


C. H. GouLpEN AND ALLAN E. 


Dominion Laboratory of Cereal Breeding 
Dominion Grain Research Laboratory 


Cereal chemistry is essentially a biological 
science and as such the possible uses of statis- 
tical methods are numerous. Since this science 
deals very frequently with differentiating var- 
ieties of field crops for quality characteristics, 
many of the techniques developed for field 
plot work are applicable, and there has been 
a considerable development along this line 
within recent years. 

The analysis of variance is probably the 
most frequently, used statistical tool in cereal 
chemistry research (13). Applications vary 
from simple examples such as the comparison 
of the mean squares for weight per bushel, 
between and within grades (12), to complex 
examples involving several factors. A typical 
example of the latter is given in the report 
of a study of variability in experimental bak- 
ing (11). The three factors affecting varia- 
bility were flours, laboratories, and baking 
formulas. An outline of one of the analyses is 
as follows: 

Degrees of freedom 


Between flours 4 
Between laboratories 
Between formulas 1 
Interactions: 
Flours X laboratories 8 
Flours < baking formulas 4 


Laboratories X baking formulas 2 

Error and triple interaction 727 
The triple interaction was found to be insig- 
nificant and was combined with the error. 
This procedure is not to be followed as a gen- 
eral practice but it did not influence the pres- 
ent results. This example illustrates the prin- 
ciple of factorial experimentation as applied 

to cereal chemistry. 

Another experimenter might consider it de- 
sirable to resort to methods more appropriate 
to an elementary demonstration in a physics 
laboratory in which one factor is varied at a 
time, with the others held constant. Thus, as 
the first phase of the investigation several 
laboratories might have tested the same flour 
by the same baking formula, and if the labora- 


tories obtained essentially similar results it 
might be concluded that future figures on loaf 
volume from the various laboratories would be 
comparable. Actually, such a conclusion 
would be correct only under exactly similar 
conditions of experimentation, that is, with 


’ the same flour and the same baking formula. 


Geddes et al (11) have demonstrated that an 
interaction exists between laboratories and 
baking formulas, so that the above hypothetical 
experiment would have missed this extremely 
important result. The demonstration of an 
interaction in this instance means that if the 
different laboratories are using different bak- 
ing formulas, their results are not entirely 
comparable. Experiments in which all except 
one of the factors are held constant are so 
limited in scope that the results are frequently 
of no practical value. 

A further advantage of factorial experiments 
is that the interactions can be submitted to a 
valid test of significance. One measure of in- 
teraction between laboratories and baking 
formulas could be obtained from two separate 
experiments, one with one baking formula 
and the other with a second baking formula. 
The variation in the mean difference between 
the results of the two formulas at the various 
baking laboratories would be a measure of 
interaction, but such a measure of interaction 
could not always be submitted to a test of 
significance. Assuming three laboratories, 
each baking ten loaves, we would have the 
following analysis of variance for each ex- 
periment: 

DAF. 

Formula No. 1 Between laboratories 2 

Within laboratories 27 

Formula No. 2 Between laboratories 2 

Within laboratories 27 
A combined analysis would give 
DF. 
Between formulas 1 
Between laboratories 2 
Interaction 2 
Error 54 


in which the error term combines the sum of 
sq tares and degrees of freedom within labora- 
tories from the previous separate analyses. 
If the experimental conditions were exactly 
comparable this procedure would be justified 
but frequently conditions are not comparable 
and the errors differ for the two experiments. 
In such cases the customary test of significance 
from a direct application of the analysis of 
variance is not valid. Cochran (3) discusses 
this situation in connection with agricultural 
experimenis. 

A study on the protein content of bulk 
wheat (16) concerned the relative importance 
of various sources of error leading to disagree- 
ment between determinations on the same 
wheat in different laboratories. The factors 
studied were heterogeneity of bulk wheat, 
sampling error, variation in cleaning proced- 
ure, differences in grinding methods, and ana- 
lytical error. The variance for grinding, for ex- 
ample, was evaluated and from it were removed 
the variations due to analytical error and to 
heterogeneity of wheat. The resulting standard 
error represented the variation due purely to 
grinding. The other factors were handled 
similarly to obtain a basis for ready interpre- 
tation. The authors conclude in one instance 
“that little is gained in precision by carrying 
out replicated analyses of the same sub-sample 
of ground wheat. Improvement in results 
should rather be sought in better sampling and 
cleaning techniques and in a more uniform 
grinding procedure.” 

A similar investigation on alimentary pastes 
(5) was designed to increase the reproduci- 
bility of micro-tests of pastes made from durum 
wheat flour. The factors were absorption, mix- 
ing temperature, mixing time, sheeting tem- 
perature, sheeting (number of times), pressure 
of sheeting rolls and time in the press. Each 
processing factor was studied at different 
levels, so that their interactions could be ex- 
amined. Most of the first order interactions 
exceeded either the 5 percent or 1 percent 
levels of significance, showing that the effect 
of a change in one processing factor depended 
not only on the level of the factor in question 
but also on the level, at the time, of the other 
factors in the experiment. An exceedingly 
interesting technique employed here was the 
use of graphs to demonstrate the meaning of 


the significant first order interactions. 

Correlation and regression techniques are 
applied quite commonly to problems in cereal 
chemistry. Simple applications have been 
used frequently but in studies of more than 
two variates, the investigators have resorted 
to the method of partials and multiples. A 
typical study (12) involved the correlation 
bteween the total flour yield of wheat, with 
weight per bushel, and four forms of physical 
damage to the kernels. From the simple cor- 
relation coefficients, all the partials were cal- 
culated using total flour yield as the dependent 
variable. Finally the multiple correlation was 
determined between weight per bushel and the 
four forms of damage. All simple correlations 
of total flour yield with each of the other 
characters were significant beyond the 5 per- 
cent level. One of the partial correlations, 
however, was less than the 5 percent level. 
This coefficient, which measured the associa- 
tion of green kernels with flour yield when 
the other variables were held constant, sug- 
gests “that green kernels have somewhat less 
influence on flour yields than the other classes 
of damaged kernels.” Had the data in this 
experiment been representative of more than 
one crop year, the influence of the various 
forms of damaged kernels might have been 
measured from the partial regression coef- 
ficients. In this way estimates would have 
been obtained of the change in total yield of 
flour per unit change in any one of the inde- 
pendent variables while the others were held 
constant. The significance of each coefficient 
could have been determined with a “t” test, 
and that of their multiple effect by an “F” 
test (20). 

The technique of covariance is proving ex- 
tremely useful. Generally the object of the 
method is to divide heterogeneous correlation 
effects into homogeneous groups. Due to the 
type of material with which cereal chemists 
work, most correlation studies could use covar- 
iance to determine whether the results were 
obtained from homogeneous populations. In 
one study (19, 14) total nitrogen was corre- 
lated with the saccharifying activity of the 
malt extracts for twelve varieties of barley 
grown at twelve stations distributed across 
Canada. In the preliminary analysis the cor- 
relations for stations, varieties, and error were 


0.959, —0.024, and 0.450, respectively. The 
station correlation was determined from the 
12 paired means for the two characters, where 
each was the mean of all varieties at one sta- 
tion. The variety correlation was determined 
from the 12 paired means for each variety 
over all stations. The correlation for the error 
or residual was that arising within stations 
after the variety effect had been removed. 

The most striking feature of this analysis 
is the wide discrepancy between the correla- 
tions of the station means and of the variety 
means. ‘There was an evident relation be- 
tween the total nitrogen and saccharifying 
activity as the total nitrogen changed from 
station to station. Although the station cor- 
relation represents only 10 degrees of freedom, 
it is highly significant. The variety correla- 
tion is quite insignificant. The genetic factors 
causing differences between the varieties in 
the total nitrogen content of the grain did not 
have a corresponding effect on the saccharify- 
ing activity of the malt extract. This example 
of heterogeneity in a total covariance illus- 
trates the necessity for separating it into its 
component parts. 


The analysis of covariance is also of import- 
ance in testing the homogeneity of regressions. 
In the above study, for example, the variety 
coefficients were tested for heterogeneity. A 
single regression was computed from the total 
sums of squares and products within all varie- 
ties, the residual sum of squares representing 
the variation for 131 degrees of freedom. 
When an individual regression was fitted for 
each variety, the residual sum of squares about 
the 12 lines represented the variation for only 
120 degrees of freedom. The eleven degrees 
of freedom in the difference measured the varia- 
tion of the twelve individual regression co- 
efficients about the “average” regression for 
all varieties. The ratio of these last two var- 
iances answered the question as to whether 
or not all varieties showed the same relation 
between nitrogen content and saccharifying 
activity. By this method it is possible to com- 
pare correlations or regressions that are in 
themselves correlated. 


Covariance may be used in controlling con- 
comitant errors (7) in studies on cereal chem- 
istry. For example, in comparing varieties 


for protein content when samples are taken 
from a wide area, factors such as rainfall or 
nitrogen content of the soil contribute to the 
variation in protein. The effect of one or both 
factors could be removed by covariance with a 
consequent improvement in the precision of 
the test. The method has been applied by 
Crampton and Hopkins (4) and by Eden and 
Fisher (6). 


When an experiment involves several levels 
of a factor, such as different quantities of bro- 
mate in a baking formula, the sum of squares 
due to levels may be separated into linear, 
quadratic, cubic, etc. effects (10, 14, 17). In 
an experiment on loaf volume using four levels 
of bromate and five varieties, an analysis could 
be made as follows: 


DF. 
Varieties 4 
Treatments — linear effects 1 
quadratic eects 1 
cubic effects 1 
4 


Interactions — linear effects < varieties 
quadratic. effects < varieties 4 
cubic effects < varieties 4 


If the experimental results were to fall mainly 
on a straight line, the greater portion of the 
sum of squares for treaments would be con- 
tained in the sum of squares for the linear 
effect. The sum of squares for quadratic 
effects representing the additional degree of 
freedom in fitting a curve of the second degree 
would be significantly larger than the error if 
there were a definite trend or curvature in one 
direction. A similar approach would apply 
to the cubic and other effects of higher order. 
Possibly the most interesting application of 
this method is in testing the significance of the 
interaction components. The variance of linear 
effects x varieties, for example, measures the 
consistency of the slope among varieties. 


The method of fitting polynomials when the 
independent variable occurs at equal intervals 
and with equal weights (7, 10), is proving of 
considerable value. In one study (1), the loaf 
volume of bread from wheat flours was deter- 
mined for equal intervals of mixing time of 
the dough. In early experiments, two loaves 
were baked for each of three mixing times, 
six or seven loaves being the maximum num- 


ber that could be handled for each kind of 
flour. This did not give a sufficient number 
of points for fitting curves which would dis- 
tinguish clearly between the flour types in 
respect to mixing tolerance. The technique 
was changed to using seven mixing times, bak- 
ing one loaf for each, and fitting a polynomial 
to the seven points. The results indicated a 
definite quadratic trend for most flours and 
the quality characteristics of the flour types 
were much more obvious. The rapidity of this 
method is a further advantage. It is therefore 
of considerable value in designing experiments 
that are expected to show trends, to obtain 
the data in such a way that the method can be 
applied. 

The determination of a regression integral 
involving a regression function varying con- 
tinuously with time (9) has been employed 
to advantage in certain cereal chemistry 
studies. In a study of the effect of the amount 
and distribution of rainfall on the protein con- 
tent of wheat (18), the results were obtained 
by this technique. The data on rainfall and 
protein covered seven stations and a period 
of fourteen years. The rainfall data for each 
year were divided into 25 five-day periods. 

The first step in the method was to express 
the amount and distribution of rainfall at 
each station for each year by the six numerical 
coefficients of a fifth degree polynomial func- 
tion of the time: 

Polo + + + psTs + + 
the six terms being mutually orthogonal. Then 


by virtue of the orthogonal properties of the 
functions T;, the regression integral of pro; 
tein upon rainfall could itself be expressed 
as a fifth degree polynomial function of time 
involving the partial regression coefficients of 
protein upon the distribution coefficients Pr. 
The final equation showed the effect on protein 
of an additional unit of rainfall at any period 
in the full interval. 


The statistical methods now available to 
the cereal chemist provide him with a valuable 
tool with which he may reap greater returns 
for his laboratory efforts. In any group of 
observations there is only so much informa- 
tion and no more. If efficient statistical meth- 
ods are applied, all of the relevant information 
will be extracted from the data, whereas im- 
proper statistical analysis would lead to loss 
of some of the information and possibly to 
inaccurate conclusions. Furthermore, one 
group of data may contain more useful infor- 
mation than another group which required 
equal time and effort. For this reason it is 
important that an investigation be approached 
statistically at the very outset-—when the ex- 
periment is being designed. While the more 
complex aspects of a statistical analysis may 
be left to the professional statistician, the 
cereal chemist must understand the logic of 
the methods being applied, for-his understand- 
ing of the principles of design and randomi- 
zation and the validity of tests is of prime im- 
portance when interpreting the final results 
(8, 2,21). 


LITERATURE CITED 


of Cereal Chemists 


Cochran, W. G. Proble ms arising in 
Jour. Roy. Stat. Soc. IV: 102-118, 1937. 


A. and F. Yates. 


- w. R. K. Larmour, an 
e infi 


Can. J 


wheat of the 1928 crop 


. Fisher. Mixing tolerances of varieties of hard red spting wheat. 
Anderson, J. A. The a's, of statistics in technical papers. Transactions, American Assoc. 
the analysis of a series of similar experiments. Suppl. 
Crampton, E. M. and J. W. Hopkins. J. Nutrition 8:329-340, 1934. 


Cunningham, R. L. and J. Ansel Anderson. 
of processing conditions on paste properties. 


Eden. T. and R. A. Fisher. Jour, Agri. Sci. 17:548-562, 1927. 
visher. R. A. Statistical methods for research k hi 
£, Stet workers. Oliver and Boyd, London and 


Fisher, R. A. The design of experiments. Oliver and Boyd. London and Edinburgh. Ed. 


Micro-tests of alimen’ 


II. Effects 
Cereal Chem. 20:482- 


Fisher, R. A. Trans. Roy. Soc. (London) B, 213:89-142, 1925. 


Fisher, R. Statistical tables for biological, 
search. Oliver and Boyd, 


and medical re- 
tion, 943. 


Malloch. experimental baking. II. 
uence of in = ne variability in loaf volume between 


The milling and baking quality of frosted 


3: Research, 6 6:119-155, 1932. 
29 


13. Goulden, C. H. Application of the variance analysis to experiments in cereal chemistry. 


Cereal Chem. 9:239- 


14. Goulden, C.H. Experimental design for cereal chemists. Cereal Chem. 21:159-171, 1944. 
15. Goulden, C. H. —— of statistical analysis. John Wiley and Sons, New York, 1939. 


16. Hildebrand, F. C. an 


d R. C. Koehn. Sources of error in the determination of the protein 


content of ooh wheat. Cereal Chem. 21:370-374, 
17. Larmour. R. A comparison of hard red spring and hard red winter wheats. Cereal 


Chem. ‘18 :778- 789, 1941. 


18. Paull, Allan E. and J. Ansel Anderson. The effects of amount and gag or of rainfall 
on the protein content of western Canadian wheat. Can. J. Research. C. 20:212-227, 1942. 

19. Sallans, Henry R. and J. Ansel Anderson. Varietal differences in ae! and malts. II. 
Saccharifyin ae of barieys and malts and the correlations between them. Can. J. 


Research C. 16:405-416, 1938. 


20. Snedecor, G. W. Statistical methods. Chap. 13. Collegiate Press Inc., Ames, Iowa, 4th 
21. Treloar, Alan E. Statistics in the service of Cereal Chemistry. Cereal Chem. 9:573-590, 1932. 


INCOMPLETE-BLOCK DESIGN ADAPTED TO PAIRED T£STS 
OF MOSQUITO REPELLENTS 
F. M. WapLey 


U. S. Department of Agriculture, Agricultural Research 
Administration, Bureau of Entomology and Plant Quarantine 


The knowledge of mosquito repellents has 
advanced rapidly in the last few years, and test- 
ing methods have received considerable atten- 
tion. The writer has studied variability in deter- 
minations of relative effectiveness of repellents 
made at several places, especially the deter- 
minations made by the Bureau of Entomology 
and Plant Quarantine at Orlando, Fla. The 
testing methods, involving the exposure of 
treated arms of human subjects to adult mos- 
quitos, result in a series of records of individ- 
ual protection periods. The exposures are 
not continuous but are made at short intervals. 
This is not a serious statistical handicap, how- 
ever, with those compounds giving long-time 
protection. 

Early in the studies it was apparent that 
dates of testing and, especially, subjects usual- 
ly showed real variability. There may well 
be variation due to interaction of date and 
subject also; no tests were made which gave 
information on this. The variance between 
right and left arms of a subject at a given 
time seemed to be as low as could be expected 
with the methods used, but even this variance 
was considerable. Date and subject variation 
must be expected in practical use, but for 
close comparisons it is well to limit the var- 
iance to the paired-arm level. Accordingly 
many tests were made of the more promising 


substances paired with the standard repellent. 
This method had its usual drawbacks—that 
a large proportion of the work consisted in 
repeating tests of the standard, and that two 
new substances had to be compared through 
the standard, with a loss of precision. 

Where several substances of already proved 
value were to be compared, it was suggested 
that the “balanced incomplete block” plan be 
tried (Yates 1940). In this case the “block” 
had to consist of the 2 arms of a subject on a 
given date, and hence could contain only 2 
units or “plots.” A trial was made by F. A. 
Morton of the Orlando laboratory in 1945, 
using 5 such substances and the standard. 
Five subjects were used to form 5 complete 
replications. Each subject was used on 3 
dates, which made a total of 15 blocks. The 
6 materials were tested once in every pair 
combination. Thus in Yates’ notation, k = 2, 
v=6,r=—5,b=— 15,4 = 1. The mosquito 
used was the standard test species, Aedes 
aegypti, and time to the second bite was the 
criterion. The analysis was carried out by 
the method of Yates (1940) for the case where 
groups of blocks form complete replications. 

Six such experiments were carried on. In 
all of them real differences between materials 
were found. In each experiment there was a 
real difference between replications (subjects) . 


In some of the experiments there was no sig- 
nificant variation between blocks after ad- 
justing for materials and replications; in 
others block variation occurred. Where there 
is a real variation between dates there will be 
a significant variance between blocks within 
replications, after allowance is made for var- 
iance between replications and materials. 
Hence absence of such block variance indicates 
absence of differences between dates. 
An analysis of an experiment with real 
block variation is summarized as follows: 
Degrees of Sum of Mean 
freedom squares square 
Materials (ignoring blocks) 5 67,631 
Replications (subjects) 4 57,215 
Blocks (eliminating mater- 
ials and replications) 10 100,251 10,025 
Error (intra-block) 10 18,260 1,826 
Adjusted material means were 242 (the 
standard) , 342, 355, 156, 267, and 238 minutes, 
compared with unadjusted values of 247, 329, 
320, 189, 267, and 248, respectively. The least 
significant difference between adjusted means 
was 74 minutes, at the 5 percent level. The 
variance of a material total is here 
[kr(v — 1) ]/[Wv(k — 1)4+W’ (v — k)], 
where W is the reciprocal of the error variance 
and W’ is (r — 1)/[{(rx block variance) — 


(error variance) ]. From this variance the 
standard error of a mean difference can easily 
be determined. 

The gain of efficiency in this plan over that 
of randomized complete blocks is about 90 
per cent; recovery of inter-block information 
gained 10 percent in efficiency. In some other 
experiments this plan gave little or no gain 
over randomized blocks, but the pair control 
insures utilization of a gain if it is available. 
The gain over complete randomization is large. 
It is not likely, however, that these types of 
randomization would be used in such experi- 
ments. The vital comparison is with the de- 
sign where each test substance is compared 
with a standard in every trial. The incom- 
plete-block design, here used, devoted five- 
sixths of the work to tests, as compared with 
one-half where a standard occurs in every 
pair. Experimental error with incomplete 
blocks is but little higher than that for com- 
parison of test and standard in similar experi- 
ments using a standard in every pair. It is 
definitely lower than that for comparison of 
two test substances through the standard. 

This experimental plan would seem to show 
real promise in certain situations of the type 
discussed. 


REFERENCE 
Yates, F. The Focavery ot inter-block information in balanced incomplete block designs. Ann. 


Eugenics 10:317-325. 


IN MEMORIAM 


Forrest RHINEHART IMMER 
1899 — 1946 


Dr. Forrest R. Immer, associate director of 
the Minnesota Agricultural Experiment Sta- 
tion, died at St. Paul, Minn., on February 2, 
1946, in his forty-seventh year. In his passing 
the University of Minnesota lost a capable ad- 
ministrator and distinguished plant breeder, 
and biological statistics lost one of its most 
constructive leaders. Dr. Immer is survived 
by his widow, Myrtle Link Immer, and one 
daughter, Ruth Ann. 


Dr. Immer was born in Spencer, Iowa, on 
July 18, 1899. While a small boy he moved 
with his parents, Minnie Maas Immer and 
Albert Immer, to a farm near Jeffers, Minn. 
He received his high school diploma at Win- 
dom, Minn., in 1917 and served a few months 
in service during the first World War. Sub- 
sequently he completed his education at the 
University of Minnesota, receiving the B.S. 
degree in 1924, the M.S. degree in 1925 and 


} 
31 


the Ph.D. degree in 1927. 


His professional advancement at the Uni- 


_ versity of Minnesota was rapid. From 1927 


to 1929 he was an instructor in Plant Genetics 
and in 1920-30 was assistant plant geneticist. 
In 1930 he became associate geneticist for the 
Division of Sugar Plants, United States De- 
partment of Agriculture. This position he 
held until 1935, except for the academic year 
of 1930-31 during which he was a fellow of the 
National Research Council, studying statistics 
with Dr. R. A. Fisher at the Rothamsted Ex- 
periment Station, England, and plant breed- 
ing at the Svalof Plant Breeding Station, 
Sweden. He was made associate professor at 
the University of Minnesota in 1935, and full 
professor in 1937. In 1941 he was appointed 
vice director of the Minnesota Agricultural 
Experiment Station and a year later was ele- 
vated to the post of associate-direetor. 


During 1944, Dr. Immer took time out from 
his administrative duties for special service in 
England with the Eighth Air Force. Assigned 
to the Operations Analysis Section, whose 
duty it was to analyze bombing operations and 
improve bombing accuracy, he served during 
the air war in Europe and received citations 
from General H. H. Arnold and Lieutenant 
General J. H. Doolittle for exemplary service. 


Upon his return in November 1944 to his 
position as associate director of the Minnesota 
Agricultural Experiment Station he plunged 
immediately into the task of fitting agricul- 
tural research to the needs of the postwar 
period. He was given the important posts of 
chairman, North Central Regional Directors, 
Farm Structures Committee; chairman, As- 
sociation of Land Grant Colleges Committee 
on Farm Structures Legislative Bill; chair- 
man, North Central Regional Directors, Poul- 


try Breeding Committee; and member of the 
Crops Section of the American Society of 
Agronomy. 


For many years Dr. Immer has been consult- 
ing editor in statistics of the American Society 
of Agronomy. Since 1935, he held the posi- 
tion of adviser in applied statistics in the 
Minnesota Agricultural Experiment Station. 


At the time of his. death, Dr. Immer was on 
the Editorial Committee for the Biometrics 
Bulletin of the Biometrics Section of the Am- 
erican Statistical Association. He had been 
very active in the formation of the Biometrics 
Section within the American Statistical As- 
sociation in 1938. 


Dr. Immer’s research has been largely in 
plant breeding with special emphasis on statis- 
tical analysis of research date. Many grad- 
uate students at the University of Minnesota 
came under his direction and guidance in his 
course in Advanced Agricultural Statistics. 
Besides 51 publications, most of them in 
scientific journals, he is author with Dr. H. 
K. Hayes of a standard textbook “Methods of 
Plant Breeding,” published in 1942. 


Among the societies that honored Dr. Immer 
with membership are Alpha Zeta, Sigma Xi, 
Gamma Sigma Delta, American Association 
for the Advancement of Science (fellow), 
Genetics Society of America, American Socie- 
ty of Agronomy, American Statistical Associ- 
ation, and the Science Club of the University 
of Minnesota. 


His integrity and unswerving devotion to 
the highest ideals of science, his kindness and 
untiring devotion to the service of others were 
a lasting inspiration to all those who had 
association with him. 

FE. L. LE CLERG 
U. S. Department of Agriculture 


WORK IN STATISTICS AT THE 
GEORGE WASHINGTON UNIVERSITY 


All statistics at The George Washington 
University is directed and taught in and by 


is a specialist in statistical theory and method. 
Each teacher is well trained in pure mathe- 


the Department of Statistics. Each teacher matics and in mathematical statistics and in 


addition has training in at least one field in 
which statistics is applied. 


The theme of the members of the depart- 
ment is that statistics is a science and is the 
fundamental and most important part of in- 
ductive logic. It is our purpose to continuous- 
ly strive to connect theory with operation. In 
our opinion, this is one of the most crying 
needs of the day. 


The functions of the department are four- 
fold: (a) To teach beginning courses in 
statistics (non-mathematical) that are useful 
and desired by other departments wishing the 
subject merely as a tool subject; (2) To teach 
statistics; (3) To assist other departments 
with their problems and in their research where 
statistical method is involved; (4) To con- 
duct and do research in mathematical and 
applied statistics as well as act in a consulta- 
tive capacity. 


The Department offers a complete program 
of courses as may be seen from the University 
Bulletin, so that a student may receive his 
Bachelor’s deg-ee with a major in statistics. 
The requirements are one year of philosophy 
including logic, the integral calculus and 24 
credits in statistics. In addition, a student 
may receive the degree Master of Arts or 
Master of Science. Here the student is re- 
quired as a prerequisite to have studied ad- 
vanced calculus, differential equations and 
statistical mathematics. The latter includes 
the pertinent knowledge of theory of func- 
tions, modern algebra, n-dimensional geom- 
etry. In addition he is required to study ad- 
vanced mathematical statistics, modern theories 
and asymptotic laws of probability, multivar- 
iate anaylsis, statistical inference, design of 
experiment and testing of hypotheses. A 
reading knowledge in one modern foreign 
language is required. Finally, a student may 
receive the Ph.D. degree in mathematical or 
applied statistics. Here a reading knowledge 
in French and German is required. For this 
degree, the student with guidance of a Master 
prepares himself in at least five fields of 
knowledge. They are analysis and modern 
algebra, mathematical statistics and probabil- 
ity, and one or two fields that are fields of ap- 
plication. This degree is given by The Grad- 


uate Council. Each year teaching fellowships 
are available for prospective students who are 
qualified to begin study for the Doctorate. 
The Department offers its courses during the 
day as well as in the evening so that a full- 
time student as well as an individual who is 
employed and wishes to study, can find an 
adequate program for his needs. Many of 
our students are full-time and others are part- 
time and hold jobs which bring them in con- 
tact with practical problems involving statis- 
tical method. 


Our classes have many students not candid- 
ates for a degree with major in statistics. 
They study the subject for the sake of its 
applications to their major subject and voca- 
tion. Also, many students study statistics as 
an elective for cultural purposes. As well as 
serving our own students who are majoring in 
statistics, we serve groups of students in var- 
ious categories of study and employment in 
statistics who are majoring in education and 
psychology, in the social sciences, in the 
languages and humanities, in philosophy, in 
engineering, in mathematics, in bacteriology 
and physiology, in the physical sciences and 
the biological sciences. 


For the Doctorate, a consultative committee 
of at least five guides the work and study of 
the student leading to the Council Fellowship 
Examinations. This committee is composed 
of individuals who represent the respective 
required fields of study. The Chairman of 
the committee is the Chairman of the Depart- 
ment of Statistics. The Council Fellowship 
Examination consists of six written examina- 
tions each taking from four to six hours to 
complete properly and is given on six success- 
ive days. At the successful completion of 
same, the student is ready for and begins work 
on his dissertation under the direction of his 
Master, a member of the Department of 
Statistics, and is eligible for and becomes a 
Fellow of the Graduate Council. 


The present members of the Department of 
Statistics are: C. R. Cassity, Otto Dekom, 
Solomon Kullback, Dorothy Morrow, A. C. 
Rosander, J. H. Smith, and Frank M. Weida, 
Chairman. 


ABSTRACTS 


(15) 


PRICE, W. C. (University of Pittsburgh). Statistical 
Abstract of Measurement of Virus Activity. 

‘Two theories regarding the nature of in- 
fection by viruses were discussed. The most 
tenable hypothesis is that infection is depend- 
ent upon the chance of a single virus particle 
coming into contact with a susceptible region 
of the host. On this basis, the plot of the 
degree of infectivity as a function of concen- 
tration leads to a slope unity over a consider- 
able range. The failure of virus infectivity 
data to conform precisely with this hypothesis 
can be accounted for by assuming a reversible 
aggregation of virus particles. In testing the 
relative activity of a virus sample, the experi- 
mental arrangement, therefore, is such that 
the slope of the infectivity curves can be cal- 
culated directly from the data. With such 
an arrangement, a slight modification of the 
procedure of Bliss and Marks can be used 
for estimating the activity of a virus sample 
and for calculating the standard error of the 
estimate. The accuracy of the method was 
tested in actual practice. 


(16) 


HOMEYER, Paul G. (lowa State College). An 
Analysis of Collaborative Chick Assays for Vitamin 
D. 
An analysis is presented for the data from 
a collaborative study by 31 laboratories of 
chick assays for vitamin D. The relative 
efficiencies of alternative methods of reducing 
the data to an estimated potency of an un- 
known are given. Estimates are made of the 
amount and sources of variation in response 
encountered in these assays and suggestions 
are given for reducing the variation. 


(17) 


BLISS, C. I. (Yale University). An Experimental De- 
sign for Slope-Ratio Assays. 


6s" h+J* 


When the response to a drug is a linear 
function of arithmetic dozage units, the rela- 
tive potency of two preparetions can be com- 
puted as a slope-ratio assay. Their dosage- 
response curves are computed by solving three 
simultaneous equations to obtain the common 
intercept a’, the slope of the standard, b:, and 
the slope of the unknown, bz. The method is 
applicable to certain microbiological assays 
for the vitamins. Their calculation is simpli- 
fied when such assays meet the following re- 
quirements: (1) restriction of treatments to 
the zone within which the response is related 
linearly to the dose, (2) equal spacing of 
doses on an arithmetic scale beginning with 
the negative control, (3) an equal number (k) 
of doses of standard and of each unknown 
and (4) one tube for each dose of unknown 
and h tubes for the negative control and for 
each dose of the standard. 

With this design it can be shown that 

a’ = 2(2k + 1)S(y) — 6Ty 
N(k — 1) + 3h(k + 1) 
where N is the total number of responses in 
the assay, S(y) is the sum of all responses 
and T,, is the sum of the products over all m 
preparations of each response multiplied by 
its coded dose, 1 to k. The slope of the 
standard is computed as 


3 2S(xy:) | 


where S(xy:) is determined from the standard. 
The slopes for the unknown are obtained 
similarly with kh = 1. In terms of coded 
doses, relative potency is computed as 

which is decoded by multiplying by i./is, the 
respective intervals between successive doses 
for the standard and for the unknown. By 
large sample theory the sampling variance of 


= 


(2k + 1)b,*) hk(k + 1) 


J is 
31 — J)? 
+ NG —D + + 
— — ... —b,S(xyp) 


S(y? — a’S(y) 


N—p-—i 


(18) 


PATTERSON, R. E. (Texas Agricultural Experiment 
Station). The Use of Adjusting Factors in the 
Analysis of Data with Disproportionate Sub-class 
Numbers. 

A method of adjusting is described whereby 
the sums of squares for the various sources of 
variance can be eliminated. When this method 
of adjusting is applied to data with unequal 
subclass numbers, it is possible to obtain a 
sum of squares for each source of variance 
that is free from the influence of the other 
effects. The process of adjusting is accomp- 
lished by substituting in the following equa- 
tion: 

X.—-X+ X= 
where X;. is the mean of the rth subcliss in 
the sth row or column, X, is the mean of the 
sth row or column, X is the grand mean and 
Avs is the adjusted mean of the rth subclass 
in the sth row or column. 

The method is based upon the assumption 
that the weighted sum of squares of the sub- 
class means that are adjusted for the border 
mean effects is an efficient estimate of the var- 
iance due to interaction. Justification of this 
assumption is indicated by the fact that the 
difference between the differences of subclass 
means for a given classification is unchanged 
by the adjusting process. It is further demon- 
strated that if a sufficient number-of adjust- 
ings are carried out the results will be the 
same as those obtained by the method of fit- 
ting constants. 


(19) 


MOOD, A. M. (lowa State College). Selection of 
Sample Sizes for Detecting Treatment Differences. 

In designing experiments to study treat- 
ment differences, there is often available an 
estimate of the error variance obtained from 
a preliminary experiment or from a previous 
experiment with similar materials. Using this 
estimate and its number of degrees of freedom, 
it is possible to compute the number of degrees 
of freedom necessary in the error variance of 
the projected experiment to detect specified 
difference means. A table has been computed 
whereby one may readily determine the num- 
ber of degrees of freedom so that the prob- 
ability will be 80 that a specified treatment 


difference will be detected at the .05 level ot 
significance. 


(20) 


DELURY, D. B. (Virginia Polytechnic Institute). The 
Analysis of Latin Squares When Some Observa- 
tions Are Missing. 

The discussion in this paper is given ex- 
plicitly for a biological array which employs 
a 4 x 4 latin square in several replications. 
However, the methods are easily adapted to 
any latin square and to various other designs 
as well. 

Methods of analysis, when some observa- 
tions are missing, are discussed. for the follow- 
ing case-> 

(1} One or more “single” observations: are 
missing. 

(2) Several columns are missing. 

(3) One column is missing. 

(4) Two columns are missing. 

(5) One column and one or more single 
observations are missing. 

In all of these cases, with the possible ex- 
ception of (5), methods of analysis have been 
available for several years. However, it is 
believed that the approach used in this paper 
leads to considerable simplification in most 
cases. 

Cases (1), (3) and (5) may be treated by 
means of formulae alone; cases (2) and (4) 
require the solution of normal equations. 
Simple systematic methods of setting up these 
normal equations are developed. 


(21) 


HARSHBARGER, Boyd. (Virginia Agricultural Ex- 
periment Station). Rectangular Lattices. 

The paper presents an extension of the 
incomplete block design to the case when the 
number of varieties or treatments is express- 
ible as the product of any two integers with 
an explicit solution for the case where the 
number is k(k — 1). The varieties are ad- 
jutsed by both the inter- and intra-block in- 
formation. The name Rectangular Lattice is 
proposed for the design since the word lattice 
carries no implications of squareness. The 
analysis for the Rectangular Lattice is derived 
by the method of fitting constants. This is a 
departure from the method used in the square 


lattice which was based on the analogy with 
a confounded factorial. 

The construction of the Rectangular Lattice 
and the arrangement of the varieties in the 
field, the numerical computations for the 
analysis of variance, the adjustment of var- 
ieties using the recovery of inter-block inform- 
ation, the calculations of the standard errors 
for testing the significance of the differences 
between variety averages, and the efficiency 
of the design relative to a complete random- 
ized blocks are all given in detail. 


The application of the analysis is illustrated 
by applying it to variety tests; however, the 
block has wide applications to other types of 
problems. Fertility or biological treatments 
may be substituted in place of varieties and 
the procedure is the same. With slight modi- 
fications the analysis can be used to measure 
the efficiency of laboratory personnel and to 
estimate the individual biases in measure- 
ments. In engineering the possibilities are 
great. 


QUERIES 


QUERY: We had a problem last spring in test- 
ing a large number of top crosses. Where you 
are going to have a group of crosses running 
into the hundreds, what are you going to do? 
(Query 1) Any design which you may use is 
going to involve a large area and the accuracy 
of the results might be questionable. We 
have a planting of a 12 x 12 triple lattice 
which is also arranged so that a check plot 
occurs every third plot. It seems to us that 
this system of check plots give more reliable 
results, especially where such factors as lodg- 
ing is as important as yield. Is this correct? 
(Query 2) 


ANSWER:(Query 1) “Where you are going to 
have a group of crosses running into the hun- 
dreds, what are you going to do?” For: such 
experiments, various designs have been pro- 
posed, the most extensively used being the in- 
complete block designs devised by Yates. In 
the balanced designs, all variety comparisons 
are made with equal accuracy, as for example 
in the balanced lattices and Youden squares. 
Rather severe restrictions on numbers of re- 
plications in these balanced designs have led 
to the wide use of partially balanced designs 
such as the simple, triple and cubic lattices. 
References: Yates—Annals of Eugenics 9:136 
(1939) and 10:317 (1940); Journal of Agri- 
cultural Science 30:672 (1940); Cox, Eck- 
hardt and Cochran—Iowa Agricultural Ex- 
periment Station Bulletin 281 (1940). 
(Query 2) “Is this correct?” In answer, I 
quote from a mimeographed text on Experi- 


mental Designs by Cochran and Cox: “The 


method of systematic controls is very flexible 
since it can be used with any number of 
treatments and any number of replicates. 
Little evidence is available about the increase 
in accuracy obtained from the controls, though 
it seems probable that the increase is seldom 
large if the extra space occupied by the con- 
trols is taken into account. The calculation 
of the best adjustments . . . is rather tedious, 
while if a crude type of adjustment is made 
most of the potential advantages of the method 
may be lost.” 

In the Journal of Agricultural Science 26: 
424 (1936), Yates expressed the opinion that 
the newer designs (mentioned above) are like- 
ly to be more efficient than randomized blocks 
with systematic check, plots. If, for example, 
querist had used a 10 x 10 simple lattice design 
for his 96 top crosses with 4 checks or stand- 
ards, he could have gained an extra replica- 
tion and at the same time could have saved 
32 plots. 

In either design, comparisons among lodg- 
ing percentages are of the same reliability as 
those among yields. 

In his 1936 article, Yates gave the appro- 
priate statistical analysis for systematic check 
plots in randomized blocks. So far as I know 
the analysis for systematic checks in a triple 
lattice has not been published. Of course, 
the experiment may be looked upon as ran- 
domized blocks to which the method of Yates 
is applicable; but the anticipsted advantage 
of the triple lattice may have inen been lost. 


—Walter T. Federer 


QUERY: One column in the table of analysis 
of variance is designated by some as Mean 
Square and by others as Variance. Which is 
right? 


ANSWER: There is no categorical answer to 
your question. Apparently, different people 
emphasize different features of the column. It 
may suffice to cite three of these features. 

1. If 5 subsamples, each with 10 items, are 
drawn at random from a normal population 
with variance, o*, the analysis of variance is: 


Source of Degrees of Mean 
Variation Freedom Square 
Means 4 M; 
Items 45 
Total 49 Ms 


Here, each M is an estimate of o*, and the 
column might well be headed, Estimates of 
Variance. 

2. If 4 male pigs are taken at random from 
each of 10 litters from a random selection of 
dams of a common breed and having the same 
sire, the analysis of variance is: 


Source of Degrees of Mean 
Variation Freedom Square 
Dams 9 Mi 
Pigs 30 


It may now be assumed that Mz is an estimate 
of o” in a normal population of male pigs un- 
differentiated by either sire or dam. But Mi 
is not an estimate of that same variance—a 
component due to dams has been added. If 
this added component is denoted by on’, then 
M; is an estimate of 
o* + 

This quantity is not the variance of either 
pigs or dams, nor is it the variance of litter 
means, the latter being o*/4 + ov*. It may 
seem clearer, therefore, to distinguish its estim- 
ate by some such term as Mean Square. But 
if this term is put in the column heading it 
covers Mz as well, and so some object to it. 
Perhaps the heading should be, Mean Square 
and Variance. 

3. In ordinary experimentation treatments 
are not chosen at random; hence, it is not in- 
formative to estimate the component of var- 
iance, om’. Still less is it useful to look upon 
Mi as representing any variance pertinent to 


the experiment. The goal is reached by com- 
puting F = M,/Msz, and the column might be 
headed, Terms of F. On the other hand, 
Fisher calls M:/Mz the variance ratio, indicat- 


- ing the fact that both terms of the fractions 


are variances. 

In the foregoing some of the complications 
are ignored, but enough has been said to show 
why I cannot give a definite answer to the 
question. Until some consensus is reached by 
the statisticians, laymen may well follow their 
preferences. George W. Snedecor 


QUERY: I wish to learn if the growth in height 
of the cotton plant may be expressed by 
Robertson’s growth equation, 

x 


log =k (t— 1), 


where x is the height of the plant at any time, 
t; a is the maximum height attained; and t; 
is the time at which x = a/2. 

I measured the height each 5 days during 
the growing season. A value of k was cal- 
culated by substituting successive pairs of x 
and t in the equation, then averaging the k’s. 
From this average k, I calculated the theoret- 
ical values of x. In testing the fitness between 
the theoretical and observed values of x by 
means of chi-squares, how should the degrees 
of freedom be computed? 


ANSWER: It is not usual to test the goodness 
of fit of a measured variate by chi-square. 
The formula 
_ = (Observed — theoretical)? 
Theoretical 
applies only to frequencies and not to values 
of such a variate as height. 

The fitness of your measurements to Robert- 
son’s formula may be tested graphically by 
plotting values of log x/(x — a) against the 
corresponding (t — t:). The points should 
indicate a straight line passing through the 
origin and having the slope, k. 

If the graph shows a reasonably good fit, 
you may wish to make more precise tests by 
use of regression. Linear regression fitted to 
your plotted points is satisfactory for most 
practical purposes: for very exacting require- 
ments, the method of least squares may be 
applied directly to the original formula. 

. Feorge W. Snedecor 


QuERY: A formula often seen for the mean 
square “between groups” is (S: — S2)*/2k 
where S, and S, are the sums in the respective 
groups and k is the number of individuals in 
each. This is true for two groups of unpaired 
variates. If the variates are paired, is the 
correct formula for mean square (S: — S:)*/k? 


ANSWER: The answer depends on the unit 
used in computation. 

If querist is using differences between paired 
observations as his unit or computation, as is 
done in applying the ordinary t-test to unique 
samples, and if he then prefers to make the 
F-test, his second formula is the correct one. 
The reason is that, under the conditions usually 
assumed, his differences estimate twice the 
variance of the original observations; hence, 
the corresponding formula for mean square 
between group means is (S: — S:)*/k. 

But if the unit of computation is the indi- 
vidual observation, as is customary in analysis 
of variance, then the first formula for mean 
square is the correct one for both group and 
individual comparisons. 

David B. Duncan 


QUERY: In an article by Frank Wilcoxon in 
the December issue of the Biometrics Bulletin 


the statement 1s made that “Table II shows 
that the total 3 indicates a probability be- 
tween 0.024 and 0.055 that these treatments 
do not differ.” Although I am not a mathe- 
matician, it seems doubtful to me that the 
author has evaluated the probability stated, 
especially since the table heading indicates a 
different probability. Will you set me straight 
on this? 


ANSWER: The statement in question should 
read, “The probability of cbtaining a total of 
three or less, under the assumption that the 
treatments do not differ, lies between 0.024 
and 0.055, as is indicated by Table II.” 

If the treatments do not differ, it would be 
expected that the rank total of one sign would 
be distributed about 18 in repetitions of such 
an experiment. There is a definite probability 
of chance occurrence of any possible total of 
one sign or a lesser total, under the assump- 
tion that the treatments do not differ. If this 
probability is sufficiently small, the assump- 
tion that the treatments do not differ is aban- 
doned, and it is concluded that the treatments 
differ. In calculating the probabilities it is 
necessary to double the probability for a total 
of one sign, since an unlikely result may arise 
either through a small negative total or through 
a small positive one. Frank Wilcoxon 


NEWS AND NOTES 


The Biological Methods Group of the So- 
ciety of Public Analysts and Other Analytical 
Chemists held its first annual general meeting 
on Monday, February 25, and the provisional 
elections of A. L. BACHARACH as Chair- 
man, and ERIC C. WOOD as Honorary Secre- 
tary, at the inaugural meeting of last October 
were confirmed. Mr. Eric C. Wood, Virol 
Limited, London reports, “The formal business 
was followed by the reading of papers by N. 
T. GRIDGEMAN on ‘The transformation of 
metameters with special reference to vitamin 
D assays,’ and by E. C. FEILLER, entitled 
‘Some remarks on the statistical background of 
bio-assays.’ Mr. Gridgeman is a bio-chemist 
with Lever Bros. and Unilever, Ltd., the large 
Oils and Fats Combine, while Feiller has re- 
cently been appointed as statistician to the 
National Physical Laboratory. Both speakers 


dealt very interestingly with the question of 
transforming measurements of variates into 
other functions for the purpose of normalizing 
distribution, equalizing the variates, or simpli- 
fying the computations. Quite a lively dis- 
cussion took place afterwards.” We would 
like to hear from groups in England or other 
foreign countries regarding their statistical 
activities . . . It is no longer necessary to feel 
sorry for D. J. FINNEY as conditions have 
improved and he is now more comfortably 
situated. In fact, he says that he is getting 
down to serious work once more, and is giving 
an elementary course on “Statistical methods 
in scientific research” ... The Statistical 
Department at Rothamsted Experimental Sta- 
tion now includes FRANK YATES, D. A. 
BOYD, 0. KEMPTHORNE and J. W. WEIL. 
We are counting on that visit to the States 


before long, Mr. Yates ... JOHN WIS. 
HART, Department of Statistics, School of 
Agriculture, University of Cambridge, is in- 
terested in linking mathematical and biolog- 
ical schools in the study of statistics ... A. 
C. FABERGE, formerly a biologist and geneti- 
cist at the Galton Laboratory, London now 
occupies the position of Research Associate 
in the Botany Department at the University 
of Wisconsin, Madison. He gave us consider- 
able information regarding the statisticians in 
England. Some of the news will be reported 
later. Mr. Faberge says, “W. L. STEVENS, 
after spending three years in Portugal teach- 
ing mathematical statistics at Coimbra on be- 
half of the British Council, has taken a job 
as statistical advisor to Imperial Chemical 
Industries, Billingham, Yorks . . . The Galton 
Professorship has been given to L. S. PEN- 
ROSE ... The physiologist H. KALMAS, 
who worked with J. B. S. HALDANE’S de- 
partment, has joined Professor Penrose in the 
position that I held ... M. S. BARTLETT 
has returned from war work to Cambridge 
recently”... A. BRADFORD HILL has suc- 
ceeded MAJOR GREENWOOD in the chair 
of medical statistics in the London School of 
Hygiene and Tropical Medicine at the Univer- 
sity of London. Professor Greenwood has 
been made Professor Emeritus ... DR. E. 
A. CORNISH, Commonwealth of Australia, 
Council for Scientific and Industrial Research, 
Section of Mathematical Statistics, University 
of Adelaide, South Australia, writes “During 
the last few years our work has increased con- 
siderably and long ago reached the stage where 
we could not cope with the influx. Increases 
of staff were demanded but I had great difficul- 
ty in securing the services of even two partly 
trained men, because the Universities here 
were doing little toward preparing people for 
statistics as a profession. In a desperate bid 
to get staff, the Council agreed to let me 
undertake the job of supervising courses in 
the University of Adelaide, so we started at 
the beginning of the current academic year. 
It was an uphill fight writing two courses of 
lectures, preparing exercises for practical 
classes, and trying to carry on with research 
at the same time. Fortunately, our efforts 
have been rewarded to some extent because 


we gained one really good recruit, and at the 
end of next year, after further intensive effort, 
we hope to appoint six more” . . . ERNEST 
E. BLANCHE has recently returned from 
Italy where he taught at the U. S. Army 
University at Florence. His new War Dept. 
assignment is with the Control Division, Army 
Service Forces, Pentagon . .. L. N. HAZEL 
formerly with Western Sheep Breeding Lab- 
oratory, U.S.D.A. Dubois, Idaho, is now with 
Kimber Poultry Breeding Farm, Niles, Cali- 
fornia. .. MAJOR WARREN H. LEONARD 
writes from Tokyo, “I could really write you 
an article on how the Japanese fail to use 
modern statistical methods.” He is Chief of 
the Agricultural Division which has four 
branches. §S. C. SALMON is head of the 
Agricultural Research Branch. His task is 
to investigate the experimental work in Japan 
and see that it has been converted back to 
peaceful ways. Mr. Leonard states, “Inci- 
dentally, there are from 375 to 400 experi- 
mental stations, branch stations, laboratories, 
and demonstration farms in Japan proper.” 
V. R. BOSWELL is also working with this 
group. . . MAJOR A. L. FINKNER, who visit- 
ed Mr. Leonard in December, is now back 
with the Bureau of Agricultural Economics 
in Raleigh, conducting cooperative work with 
the Institute of Statistics — a married man 
now! ... You have not been forgotten down 
there in Mexico, E. J. WELLHAUSEN. Your 
letter presenting a “new source of error” has 
received consideration by the young reasearch 
workers here. I do believe some of them want 
to help you try to solve that problem. Mr. 
Wellhausen reports, “I think I can justly 
claim to be the first person to have used any 
of the modern experimental designs in Mexico. 
They will do the job here too. This year we 
planted in different parts of Mexico 45 simple 
lattice experiments in corn yield tests each 
containing 49 varieties, 4 replications, 24 hill 
plots. The last month (Dec.) I have been 
busy harvesting corn. In one section of Mex- 
ico, near Cortazar, we employed women as 
harvesters. In these experiments I discovered 
a new source of error which even at my old 
age rather embarassed me ... (censored) .” 
. . . Recent appointments to the staff of the 
Institute of Statistics at Raleigh include H. 


L. LUCAS from Cornell University, R. J. 
MONROE released from the Army, and W. G. 
COCHRAN from Iowa State College. Mr. 


Cochran is an Associate Director of the 
Institute. 


_ Officers of the American Statistical Associa- 
: President, Isador Lubin; Directors, Ches- 
Bliss, E. Grosvenor. Plowman, Walter A. 

A. Stouffer, Willard L. 

rp, Helen M. ei Vice-Presidents, F. 

Carmichael, S. S. Wilks, D rothy Swaine 
; Secretary-Treasurer, Lester S. Kel- 


Officers of the Biometrics Section: Shalomee. 


. Fertig, J. G. 


Editorial Committee for the Biometrics Bul- 
L. Anderson, I Bliss, W Cochran, 
Churchill H. W. G. W. 
Snedecor, C. P. Winsor. 


Material for the BULLETIN should be ad- 
dressed to the Chairman of the Editorial Com- 
mittee, Institute of srsatloaion, North Carolina 
State College, Raleigh, N. C., material for 
Queries should go to “Queries,” Statistical 
Laboratory, Iowa State College, Ames, Iowa, 
or to any of the committee. 


—— 
tion Committee members; E. J. deBeer, A. E. 
Brandt, J. WEE «Osborne, J. W. 
Tukey. 


